Add the actual XRay remote sampler which polls rules and orders / app… #3343

anuraaga · 2021-06-24T06:48:26Z

…lies them.

This implements the core part of XRay remote sampling, which polls rules and applies a fixed rate based on matching rules. This is the most important part, but in followups I will also be adding support for XRay's "centralized reservoir" (where the server handles the reservoir quotas instead of being fixed per client, this is what the GetSamplingTargets RPC is used for). @jkwatson You aint seen complicated yet :P

…lies them.

codecov · 2021-06-24T06:51:39Z

Codecov Report

Merging #3343 (256b441) into main (fb7e8d2) will increase coverage by 18.46%.
The diff coverage is n/a.

❗ Current head 256b441 differs from pull request most recent head 2254dd7. Consider uploading reports for the commit 2254dd7 to get more accurate results

@@             Coverage Diff             @@
##             main    #3343       +/-   ##
===========================================
+ Coverage        0   18.46%   +18.46%     
- Complexity      0      721      +721     
===========================================
  Files           0      351      +351     
  Lines           0     9618     +9618     
  Branches        0      972      +972     
===========================================
+ Hits            0     1776     +1776     
- Misses          0     7657     +7657     
- Partials        0      185      +185

Impacted Files	Coverage Δ
...emetry/exporter/logging/LoggingMetricExporter.java	`0.00% <0.00%> (ø)`
...er/logging/otlp/OtlpJsonLoggingMetricExporter.java	`0.00% <0.00%> (ø)`
.../java/io/opentelemetry/sdk/internal/TestClock.java	`0.00% <0.00%> (ø)`
...io/opentelemetry/context/ImplicitContextKeyed.java	`100.00% <0.00%> (ø)`
...sion/trace/jaeger/sampler/JaegerRemoteSampler.java	`80.55% <0.00%> (ø)`
.../metrics/aggregator/DoubleHistogramAggregator.java	`0.00% <0.00%> (ø)`
...s/AbstractDoubleAsynchronousInstrumentBuilder.java	`0.00% <0.00%> (ø)`
...pentelemetry/api/metrics/DefaultMeterProvider.java	`0.00% <0.00%> (ø)`
...er/otlp/trace/InstrumentationLibraryMarshaler.java	`0.00% <0.00%> (ø)`
...src/main/java/io/opentelemetry/api/trace/Span.java	`31.70% <0.00%> (ø)`
... and 341 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb7e8d2...2254dd7. Read the comment docs.

ghost · 2021-06-24T12:03:19Z

...ensions/aws/src/main/java/io/opentelemetry/sdk/extension/aws/trace/AwsXrayRemoteSampler.java

+    return new AwsXrayRemoteSamplerBuilder(resource);
+  }
+
+  AwsXrayRemoteSampler(


NIT: this is my defensive thinking, but as the constructor is package-protected and thus accessible, I'd validate constructor parameters here.

I think they're all validated in the builder

ghost · 2021-06-24T12:05:50Z

...ensions/aws/src/main/java/io/opentelemetry/sdk/extension/aws/trace/AwsXrayRemoteSampler.java

+
+    pollFuture =
+        executor.scheduleAtFixedRate(
+            this::getAndUpdateSampler, 0, pollingIntervalNanos, TimeUnit.NANOSECONDS);


I believe that pooling interval should have a sanity check - that is if the value is outside of typical, expected values, a warning should be logged. For me it's easy to imagine sb setting interval to 1 (assuming it's one second) and degrading the performance with request each nanosecond ;)

it looks like all the validation is down in the builder.

ghost · 2021-06-24T12:31:27Z

...ensions/aws/src/main/java/io/opentelemetry/sdk/extension/aws/trace/SamplingRulesSampler.java

+
+    // In practice, X-Ray always returns a Default rule that matches all requests so it is a bug in
+    // our code or X-Ray to reach here, fallback just in case.
+    logger.log(


Hm. In the case this happens, it will totally spam the logs at WARNING for every span. I know it shouldn't ever happen, but should we maybe not be so aggressive with the logging here?

True that the fallback loses its meaning if it adds destructive spam on every single span, turned it to FINE (thought about throttling logger but will lose access to it in contrib repo 😿 )

jkwatson · 2021-06-24T14:58:36Z

You aint seen complicated yet :P

Hm. If that's the case, I'm definitely starting to question if this repository is the right place to host the code. Can AWS publish a library with all this code in it that we can just depend on, rather than having it all live here (as well in wherever it lives for the old AWS java instrumentation)?

anuraaga · 2021-06-24T15:16:18Z

I think code for otel components that interface with vendors generally go in an otel contrib repo. We've been treating the sdk-extensions as a sort of contrib but maybe we should move to java-contrib? Not sure if it makes a difference in the actual scheme of things though - if the issue is review load a better option (though not mutually exclusive) is probably to add @kubawach to approvers finally :)

jkwatson · 2021-06-24T15:24:17Z

I think code for otel components that interface with vendors generally go in an otel contrib repo. We've been treating the sdk-extensions as a sort of contrib but maybe we should move to java-contrib? Not sure if it makes a difference in the actual scheme of things though - if the issue is review load a better option (though not mutually exclusive) is probably to add @kubawach to approvers finally :)

The issue isn't review load (but I'm also in favor of getting @kubawach added to approvers), but potential maintenance & support load over the long term. If you are the only person who is really able to maintain the code, and debug issues with its interaction with AWS services, then that feels like the core implementation details probably don't belong here. A simple AWS propagator is one thing, but when we're talking about maintaining several thousand lines of non-trivial code for an AWS-specific sampler implementation, I'm probably against it.

…nto xray-remote-sampler

jkwatson · 2021-06-25T14:56:09Z

...ensions/aws/src/main/java/io/opentelemetry/sdk/extension/aws/trace/AwsXrayRemoteSampler.java

+    this.resource = resource;
+    this.initialSampler = initialSampler;
+    client = new XraySamplerClient(endpoint);
+    executor =


personally, I'd just assign this in the declaration above, since it's final and doesn't depend on params.

I tend to like either all assigned as fields or all in constructor, otherwise it seems like an ugly duck field

jkwatson

Thanks! Looks good enough to merge, then move over to the contrib repo.

jmacd

👀

Speaking for myself (and my employer), we have a strong interest in being able to approximately count sampled spans. I am also confident we can do this counting correctly and with a specification, as discussed in open-telemetry/oteps#148. From this perspective, it's the "Priority", "FixedRate", and "ReservoirSize" that are involved in sampling probability. As discussed in OTEP 148 Samplers are capable of producing probabilities for fixed rate sampling, we'll just have to replace Leaky-bucket rate limiting with Varopt reservoir sampling.

I'd like to draw a non-blocking connection between the XRay sampler and the discussion about Views taking place in the Metrics SIG. Sampler configurations are like views on spans, and as long as we're able to count them, they are also views that can produce metrics. The big question ia whether this configuration syntax can become a standard for sampling in OTel's default SDKs, and if so--how is configuration is supplied?

anuraaga · 2021-06-26T02:57:09Z

@jmacd Thanks - I agree that any rate limiting can be made compatible with the need for counting spans by tweaking the algorithm, perhaps to Varopt. In the meantime I'll be starting with the same as Jaeger but will be happy to change in the future.

The big question ia whether this configuration syntax can become a standard for sampling in OTel's default SDKs, and if so--how is configuration is supplied?

Configuration is basically just data, not that much unlike even spans - just the data that's sent is about configuring an app, not an event. I'd expect we define a proto and transmit it over gRPC just like OTLP, there doesn't seem to be a reason to break from convention. What's in the proto will be a finer nuance - I wouldn't expect to crib much from the X-Ray data model for sampling since it's not very intuitive / well designed I think. In particular, reasoning out the result of having both rate limiting and fixed rate possible at the same time is mind blowing. Adding attribute matching to the jaeger data model is probably a better start.

Add the actual XRay remote sampler which polls rules and orders / app…

1f2f1e5

…lies them.

anuraaga requested review from arminru, bogdandrutu, carlosalberto, jkwatson, Oberon00, pavolloffay, thisthat and tylerbenson as code owners June 24, 2021 06:48

Test cleanup

256b441

ghost approved these changes Jun 24, 2021

View reviewed changes

Merge branch 'main' of github.com:open-telemetry/opentelemetry-java i…

4ecc35c

…nto xray-remote-sampler

jkwatson reviewed Jun 25, 2021

View reviewed changes

jkwatson approved these changes Jun 25, 2021

View reviewed changes

jmacd approved these changes Jun 25, 2021

View reviewed changes

Don't spam logs

2254dd7

anuraaga merged commit f83a953 into open-telemetry:main Jun 27, 2021

This was referenced Dec 19, 2021

Temurin JDK #4011

Merged

use Eclipse Temurin JDK docker image #4012

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the actual XRay remote sampler which polls rules and orders / app… #3343

Add the actual XRay remote sampler which polls rules and orders / app… #3343

anuraaga commented Jun 24, 2021

codecov bot commented Jun 24, 2021 •

edited

Loading

ghost Jun 24, 2021

jkwatson Jun 25, 2021

ghost Jun 24, 2021

jkwatson Jun 25, 2021

ghost Jun 24, 2021

jkwatson Jun 25, 2021

anuraaga Jun 26, 2021

jkwatson commented Jun 24, 2021

anuraaga commented Jun 24, 2021

jkwatson commented Jun 24, 2021 •

edited

Loading

jkwatson Jun 25, 2021

anuraaga Jun 26, 2021

jkwatson left a comment

jmacd left a comment •

edited

Loading

anuraaga commented Jun 26, 2021

Add the actual XRay remote sampler which polls rules and orders / app… #3343

Add the actual XRay remote sampler which polls rules and orders / app… #3343

Conversation

anuraaga commented Jun 24, 2021

codecov bot commented Jun 24, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkwatson commented Jun 24, 2021

anuraaga commented Jun 24, 2021

jkwatson commented Jun 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkwatson left a comment

Choose a reason for hiding this comment

jmacd left a comment • edited Loading

Choose a reason for hiding this comment

anuraaga commented Jun 26, 2021

codecov bot commented Jun 24, 2021 •

edited

Loading

jkwatson commented Jun 24, 2021 •

edited

Loading

jmacd left a comment •

edited

Loading